<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.3.4">Jekyll</generator><link href="http://egonw.github.io/blog/feed/by_tag/chem4word.xml" rel="self" type="application/atom+xml" /><link href="http://egonw.github.io/blog/" rel="alternate" type="text/html" /><updated>2026-04-06T12:06:23+00:00</updated><id>http://egonw.github.io/blog/feed/by_tag/chem4word.xml</id><title type="html">chem-bla-ics</title><subtitle>Chemblaics (pronounced chem-bla-ics) is the science that uses open science and computers to solve problems in chemistry, biochemistry and related fields.</subtitle><author><name>Egon Willighagen</name></author><entry><title type="html">Extracting RDF from Chem4Word documents</title><link href="http://egonw.github.io/blog/2010/01/21/extracting-rdf-from-chem4word-documents.html" rel="alternate" type="text/html" title="Extracting RDF from Chem4Word documents" /><published>2010-01-21T00:00:00+00:00</published><updated>2010-01-21T00:00:00+00:00</updated><id>http://egonw.github.io/blog/2010/01/21/extracting-rdf-from-chem4word-documents</id><content type="html" xml:base="http://egonw.github.io/blog/2010/01/21/extracting-rdf-from-chem4word-documents.html"><![CDATA[<p><a href="http://jat45.wordpress.com/">Joe</a> has released the first <a href="http://research.microsoft.com/en-us/projects/chem4word/">Chem4Word</a>
<a href="http://jat45.files.wordpress.com/2010/01/example.docx">demo file</a>, and has written about how to
<a href="http://jat45.wordpress.com/2010/01/20/extracting-cml-from-a-chem4word-authored-document-java/">extract the CML with Java</a>
and <a href="http://jat45.wordpress.com/2010/01/21/extracting-cml-from-a-chem4word-authored-document-c/">with C#</a>.</p>

<p>I haven’t actually gotten around to fiddling with Java, but ran <a href="http://strigi.sf.net/">Strigi</a> against it to extract RDF,
while having the <a href="http://neksa.blogspot.com/2007/05/introduction.html">Strigi-Chemistry</a> plugins installed. This is part of the
<a href="http://en.wikipedia.org/wiki/Resource_Description_Framework">RDF</a> that came out:</p>

<div class="language-turtle highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nl">&lt;example-doc.docx&gt;</span><span class="w">
  </span><span class="nl">&lt;http://freedesktop.org/standards/xesam/1.0/core#title&gt;</span><span class="w">
    </span><span class="s">"acetic acid"</span><span class="p">,</span><span class="w">
    </span><span class="s">"(8R,9S,10R,13S,14S,17S)- 17-hydroxy-10,13-dimethyl- 1,2,6,7,8,9,11,12,14,15,16,17-dodecahydrocyclopenta[a] phenanthren-3-one"</span><span class="p">,</span><span class="w">
    </span><span class="s">"testosterone"</span><span class="p">;</span><span class="w">
  </span><span class="nl">&lt;http://freedesktop.org/standards/xesam/1.0/core#version&gt;</span><span class="w">
    </span><span class="s">"2"</span><span class="p">,</span><span class="w">
    </span><span class="s">"2"</span><span class="p">;</span><span class="w">
  </span><span class="nl">&lt;http://rdf.openmolecules.net/0.9#atomCount&gt;</span><span class="w">
    </span><span class="s">"8"</span><span class="p">,</span><span class="w">
    </span><span class="s">"49"</span><span class="p">;</span><span class="w">
  </span><span class="nl">&lt;http://rdf.openmolecules.net/0.9#bondCount&gt;</span><span class="w">
    </span><span class="s">"7"</span><span class="p">,</span><span class="w">
    </span><span class="s">"52"</span><span class="p">;</span><span class="w">
  </span><span class="nl">&lt;http://rdf.openmolecules.net/0.9#molecularFormula&gt;</span><span class="w">
    </span><span class="s">"C2H4O2"</span><span class="p">,</span><span class="w">
    </span><span class="s">"C19H28O2"</span><span class="p">;</span><span class="w">
</span></code></pre></div></div>

<p>I believe there is quite some room for improvement, but it’s a start :) Thanx to Joe for posting the public domain test file, so
that other projects can start play with the exiting new technology. I should note, however, that I am not running a Microsoft OS
nor MS-Word, and the saved documents source are the only way I have access to the
<a href="http://en.wikipedia.org/wiki/Chemical_Markup_Language">CML</a> right now.</p>]]></content><author><name>Egon Willighagen</name></author><category term="cml" /><category term="java" /><category term="rdf" /><category term="chem4word" /><category term="strigi" /><summary type="html"><![CDATA[Joe has released the first Chem4Word demo file, and has written about how to extract the CML with Java and with C#.]]></summary></entry></feed>